Snowflake
Querying Overview
Snowflake querying is based on standard SQL, supporting both ANSI SQL and analytic extensions for data analysis. Snowflake allows combining structured and semi-structured data, such as JSON, in the same query for flexible analysis. A general reference for querying can be found at https://docs.snowflake.com/en/sql-reference-commands.
Qualifying the Context
When defining queries in Qarbine it is beneficial to know what context information was set in the defined Qarbine data service. The Qarbine administrator can set the database and schema within the data service definition. When generally querying tables in Snowflake, whether you need to use a schema prefix (like public or any other schema name) depends on your current session context:
If the data service has a specified schema, you do not need to prefix the schema name in your queries. Otherwise, you may need to qualify your table names with the schema name (for example, public.my_table). Snowflake will use the default schema for your user or the PUBLIC schema if your user’s default schema is not set or is PUBLIC.
If the data service has not specified a database, you must use a fully qualified name:
<database_name>.<schema_name>.<table_name>
If you have set a database but not a schema, you must use
<schema_name>.<table_name>
General Datatype Handling
Qarbine retrieves standard datatypes such as numerics, strings, booleans, and dates in their expected JavaScript form. For more information on datatypes see ttps://docs.snowflake.com/en/sql-reference-data-types
Snowflake provides a wide variety of functions to support querying and other data interactions. For more information see https://docs.snowflake.com/en/sql-reference-functions.
JSON Handling
JSON data is generally stored in VARIANT or OBJECT data type columns. In Qarbine this data is returned in JSON object format (vs. simple text). For more information on this see https://docs.snowflake.com/en/sql-reference/data-types-semistructured
Below is information on a sample SNOWFLAKE_LEARNING_DB table.
A sample row is shown below.
Notice that the MENU_ITEM_HEALTH_METRICS_OBJ is a true JavaScript object in the row. WIth Qarbine, there is no extra programming required to interact with the object in its native form.
Vector Searching
Vector searching provided a means to find similar rows based on embeddings which have been computed for text. This is in contrast to exact string matches or regular expression matching. For details on the VECTOR datatype see https://docs.snowflake.com/en/sql-reference/data-types-vector
Snowflake Cortex provides four vector similarity functions:
- VECTOR_INNER_PRODUCT,
- VECTOR_L1_DISTANCE,
- VECTOR_L2_DISTANCE, and
- VECTOR_COSINE_SIMILARI.TY
See this page for links to these functions https://docs.snowflake.com/en/sql-reference/functions-vector
For details about embeddings and their use in similarity searching see https://docs.snowflake.com/en/user-guide/snowflake-cortex/vector-embeddings
Given this table definition
CREATE OR REPLACE TABLE documents (
doc_id INTEGER,
doc_text VARCHAR,
doc_vec VECTOR(FLOAT, 768)
);
It can be populated using
INSERT INTO documents (doc_id, doc_text)
VALUES
(1, 'How do I install the Snowflake CLI?'),
(2, 'What are the best practices for data security in Snowflake?'),
(3, 'How can I optimize my Snowflake queries?'),
(4, 'What is the Snowflake Data Cloud?');
-- 3. Generate and store embeddings
UPDATE documents
SET doc_vec = SNOWFLAKE.CORTEX.EMBED_TEXT_768('snowflake-arctic-embed-m', doc_text)
WHERE doc_vec is null;
You can add additional rows as desired.
A sample vector query is shown below.
SELECT doc_id, doc_text,
VECTOR_COSINE_SIMILARITY( doc_vec,
SNOWFLAKE.CORTEX.EMBED_TEXT_768('snowflake-arctic-embed-m',
'How to set up Snowflake?')
) AS similarity
FROM documents
ORDER BY similarity DESC
LIMIT 3;
Below is a sample answer set when the above is run in the Data Source Designer.
Qarbine variables can easily be used in this query to provide a more dynamic interaction.Adjusting the query specification to have a variable placeholder as shown below
SNOWFLAKE.CORTEX.EMBED_TEXT_768('snowflake-arctic-embed-m',
@userPhrase)
) AS similarity
results in a prompt for the variable
Enter a phase
and click
An answer set is shown below.
Array Handing
Querying columns with ARRAY datatype returns the array of values. For example, given this table definition
CREATE OR REPLACE TABLE products (
product_id INTEGER,
product_name VARCHAR,
sizes ARRAY
);
and inserted row
INSERT INTO products (product_id, product_name, sizes)
SELECT
1,
'T-Shirt',
ARRAY_CONSTRUCT('XS', 'S', 'M', 'L', 'XL');
the query
SELECT * FROM products;
returns the answer set
The answer set row is shown below.
Notice that the SIZES are in their natural JavaScript object array form.
Snowflake extends standard SQL with specialized syntax and functions to query semi-structured data directly. For example, users can access nested fields using dot notation or the :: casting operator, and flatten arrays with the FLATTEN function to convert nested data into a relational format. This allows for efficient querying and analysis of semi-structured data within the same SQL environment as structured tables.
For example, this query
SELECT product_id, product_name, f.value AS size
FROM products, LATERAL FLATTEN(sizes) f;
returns the answer set
In general such answer set explosion is not at all necessary in Qarbine as the array can be interacted with in its native JavaScript form.
Manipulating Row Shape
Qarbine can be directed to further modify the answer set elements This is done using ‘pragmas” which are discussed in the general Data Source Designer documentation. The most common pragma likely to be used is “pullFieldsUp CSV_list_of_fields”.
This answer set can be simplified by using Qarbine pragmas as shown below.
SELECT * FROM COLORS
The answer set is shown below.
A sample answer set row is shown below.
This answer set can be simplified by using Qarbine pragmas as shown below.
#pragma pullFieldsUp JSON_DATA
select * from COLORS
The answer set is shown below.
A sample answer set row is shown below.
The fields that were previously within the JSON_DATA field object have been pulled up a level. This makes it more convenient to reference field values in a template formula. Access to the result row values of interest in a template formula may now use either
@current.category
or
#category
This example can be found at “example/Snowflake/JSON colors 1 with pragma”.
The attribute field values across an answer set may be very diverse so the Data Source Designer’s columns may be incomplete.
The above sample data is based on the article at
https://www.snowflake.com/en/blog/automating-snowflakes-semi-structured-json-data-handling/
Object Handling
The OBJECT datatype is handled by Qarbine in a similar way as the JSON datatype described above. Given this table definition
CREATE OR REPLACE TABLE employee_info (
employee_id INTEGER,
info OBJECT
);
Snowflake provides the OBJECT_CONSTRUCT function to build an OBJECT from key-value pairs. Data can be inserted using
INSERT INTO employee_info (employee_id, info)
SELECT
101,
OBJECT_CONSTRUCT('name', 'Alice', 'age', 30, 'department', 'Engineering');
Running this query
SELECT * FROM employee_info;
returns the answer set
The first row is shown below.
The INFO fields can be pulled up using the technique described above.
Miscellaneous Queries
To get a list of databases run
show databases
To get the current database run
select CURRENT_DATABASE()
To get a list of tables run
show table
To get a table summary run
desc table PRODUCTS
To get the columns of a table run a query similar to
show columns in products
To obtain clustering key information run
select SYSTEM$CLUSTERING_INFORMATION('PRODUCTS')
To get information on database users run
show users
Troubleshooting
You can obtain the low level query being sent from Qarbine to Snowflake by pressing the ALT key and clicking the run icon. There are several Snowflake company and third party querying tools available to test your SQL.
Snowsight (Web UI) is the primary web interface for Snowflake, offering a modern, feature-rich environment for writing and running queries and visualizing results. A good starting point can be found at https://docs.snowflake.com/en/user-guide/ui-snowsight-quick-tour
SnowSQL is a command-line tool for executing SQL, managing Snowflake objects, and running scripts. A good starting point can be found at https://docs.snowflake.com/en/user-guide/snowsql